HW #3 Drafting Viz

Author

Maxwell Patterson

Published

February 27, 2024

Overview

I plan to pursue the first option, where I will make 3 different graphics, each of which is tailored for a certain audience/purpose. While I like the idea of making an infographic with a fun theme and similar plots, I think this first option is the better route for me so I can practice targeting certain types of viewers (something our capstone group will be implementing so this is great practice!).

My question is: How are each of the teams in my fantasy basketball dyansty league constructed based on player age and fantasy scoring? Which teams are gunning for the championship this season, and which younger teams are set up for success in future years?

This question has changed a bit since the beginning of the quarter. Initially I wanted to also make some visuals on trades that have happened during this season in the fantasy league, but I realized that cleaning and tidying this data would take tons of time and not be as productive or useful for this assignment. So now I am just focusing on the fantasy team comparisons and breakdowns, highlighting age and fantasy points in the analysis.

Variables

Now I will dig into the variables I am interested in using from the data I have pulled from the Fantrax website where the league exists on. I’ll start by reading in the data.

# Clear environment for sanity
rm(list=ls())

# Import data
fantrax_data_raw <- read.csv(here::here('data/fantrax_02_05_24_vHW.csv'))

# Clean up data a bit
fantrax_data <- fantrax_data_raw %>% 
  janitor::clean_names()

# Define team names for enhanced visualization
teams <- c("STARKS", "SERP", "SDP", "Orcas", "maxpat01", "Jmarr237", "HBC", "GBRAYERS", "CCC", "BIGFOOTS", "BBB", "$C/$")

# Define updated team names for enhanced visualization
new_team_names <- c(STARKS = "Winterfell", SERP = "Slytherin", SDP = "San Diego",
                    Orcas = "Anacortes", maxpat01 = "Santa Barbara", Jmarr237 = "Malibu",
                    HBC = "Helsinki", GBRAYERS = "Scottsdale", CCC = "Cream City",
                    BIGFOOTS = "Beaverton", BBB = "Bikini Bottom", "$C/$" = "Las Vegas")

# Clean up column names and types
fantrax_data <- fantrax_data %>% 
  rename(fantasy_team = status) %>% 
  filter(fantasy_team %in% teams) %>% 
  mutate(
    adp = as.numeric(adp),
    x_d = as.numeric(gsub("%", "", x_d)) / 100, 
    ros = as.numeric(gsub("%", "", ros)) / 100,
    fantasy_team = case_when(
      fantasy_team %in% names(new_team_names) ~ new_team_names[fantasy_team],
      TRUE ~ fantasy_team  # Keeps the original name if not found 
    )
  ) %>% 
  select(-c(id, opponent, x_d, x, )) # Remove unnecessary columns

Now that the data has been imported and cleaned up a bit, I’ll dive into the columns that are most important for the assignment, and some additional wrangling and updating that will be needed in order to answer the question at hand.

# Display the dimensions of the data
paste0("Number of rows in the data: ", nrow(fantrax_data))
[1] "Number of rows in the data: 319"
paste0("Number of columns in the data: ", ncol(fantrax_data))
[1] "Number of columns in the data: 23"
# Display the column names of the data
col_names <- colnames(fantrax_data)
print(col_names) # UPDATE THIS
 [1] "player"       "team"         "position"     "rk_ov"        "fantasy_team"
 [6] "age"          "f_pts"        "fp_g"         "adp"          "ros"         
[11] "fgm"          "fga"          "x3ptm"        "ftm"          "fta"         
[16] "pts"          "reb"          "ast"          "st"           "blk"         
[21] "to"           "x3d"          "x2d"         

The columns in this cleaned data set that I will be using in the analysis player, fantasy_team, f_pts, fp_g, and age. Further manipulation and additions to the data will be done as needed to understand the relationships between age, fantasy points (f_pts), and fantasy points per game (fp_g) in the dynasty league.

There will be some important data wrangling done in order to create the visuals I have in mind. First, I will filter the data for only teams that are on fantasy rosters so it is easier to work with and understand the player data that actually matters. Next, I will create summary tables of the top 12 players average fantasy points and age for each fantasy roster for plotting purposes. Also, I will create age groups for each team (under 25, 25-29, and over 30) and create a summary table of each fantasy teams total fantasy points, and the fantasy points for each age group for the sunburst chart. All of the summary tables I will create will be fairly straightforward using the filitered Fantrax data for players on fantasy rosters.

Inspiration

Here are some visuals that I have taken inspiration from when designing charts for my project:

This image made me confident that I can show each of the teams total fantasy scoring in a sunburst plot, and that it won’t be too busy/cluttered. There are a ton of subgroups in the outer layers, so this graph inspired me to feel confident about building a sunburst plot of total fantasy points, then broken down by fantasy team, then broken down further by age group for each fantasy team.

Here is the second visual, a personal favorite of mine:

I just love the texture of this plot, and having the team logos makes it look really nice. I follow this user on Twitter and I feel like the graphics really paint a picture of the league. Also, I want to add annotations to by scatter plot, and I like the simplicity and intentional selection of only certain teams getting an annotation. In my scatter plot comparing the average fantasy points per game versus average age of the top 12 players on each team, I plan to incorporate annotations that speak on the draft capital of certain teams that either have a ton or very little draft capital. I

Preliminary code

# Calculate average age and average fp_g for each fantasy team 
# Manually add draft capital for each team 
team_averages <- fantrax_data %>%
  group_by(fantasy_team) %>%
  summarise(
    AverageAge = mean(age, na.rm = TRUE),
    AverageFpG = mean(fp_g, na.rm = TRUE)
  ) %>% 
  mutate(draft_capital = c(10, 25, 5, 7, 40, 30, 25, 10, 10, 40, 15, 18)) # Need to 
                                                                          # formalize

# Add team logos as points
team_averages$logo_path <- c("images/orcas.png", "images/bigfoots.png", 
                             "images/BBB.png", "images/crusaders.png",
                             "images/hounds.png", "images/scorpions.png",
                             "images/milkers.png", "images/pilots.png", 
                             "images/swell.png", "images/stotches.png",
                             "images/serpents.png", "images/starks.png")

# Define team logos 
logos <- data.frame(team = c("Anacortes", "Beaverton", "Bikini Bottom",
                             "Cream City", "Helsinki", "Las Vegas", "Malibu",
                             "San Diego", "Santa Barbara", "Slytherin", 
                             "Scottsdale", "Winterfell"),
                    logo = c("images/orcas.png", "images/bigfoots.png", 
                             "images/BBB.png", "images/crusaders.png",
                             "images/hounds.png", "images/scorpions.png",
                             "images/milkers.png", "images/pilots.png", 
                             "images/swell.png", "images/stotches.png",
                             "images/serpents.png", "images/starks.png"))

Plots

#1: Ridge plot (broad audience)

Here is the mock-up of the ridge plot. Sorry my handwriting isn’t the best, I was doing this on the plane and it was a little bumpy so the ridges aren’t as smooth as they could be.

knitr::include_graphics('images/ridge_sketch.png')

# Calculate the average age for each fantasy team
team_avg_age <- fantrax_data %>%
  group_by(fantasy_team) %>%
  summarise(AverageAge = mean(age, na.rm = TRUE))

# Merge the average age data with full data
fantrax_data <- fantrax_data %>%
  left_join(team_avg_age, by = "fantasy_team")

# Order the teams by average fp_g
team_order <- fantrax_data %>%
  group_by(fantasy_team) %>%
  summarise(AverageFpG = mean(fp_g, na.rm = TRUE)) %>%
  arrange(desc(AverageFpG)) %>%
  pull(fantasy_team)

# Make sure data is ordered correctly for plotting
fantrax_data$fantasy_team <- factor(fantrax_data$fantasy_team, levels = team_order)

# Create ridge plot
ridge_plot <- ggplot(fantrax_data, aes(x = fp_g, y = fantasy_team, fill = AverageAge)) + geom_density_ridges(
    aes(height = ..density..), 
    alpha = 0.5, 
    scale = 2, 
    rel_min_height = 0.02
  ) +
  scale_fill_gradientn(
    colors = c("green", "yellow", "red"), 
    limits = c(min(fantrax_data$AverageAge), max(fantrax_data$AverageAge)),
    name = "Average Age"
  ) +
  labs(title = "Fantasy Scoring Distribution by Team",
       x = "Fantasy Points per Game",
       y = "") +
  theme_ridges(grid = FALSE) +
  theme(
    text = element_text(family = "Courier"),
    plot.background = element_rect(fill = "antiquewhite"), # Placeholder 
    panel.background = element_blank(), 
    axis.title = element_text(size = 11), 
    axis.text.y = element_text(size = 10), 
  ) 

# Add team logos to ridge plot
ridge_logos <- ridge_plot + 
  geom_image(data = logos, aes(x = -20, y = team, image = logo), size = 0.07, inherit.aes = FALSE)

# Display ridge plot
ridge_logos
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
Picking joint bandwidth of 4.72

# Save ridge plot
ggsave("images/ridge.png", ridge_logos, width = 10, height = 6, dpi = 300)
Picking joint bandwidth of 4.72

#2: Sunburst plot (presentation)

Here is the sketched version:

# Updating data to add player age groups
fantrax_data <- fantrax_data %>%
  mutate(age_group = case_when(
    age < 25 ~ "Under 25",
    age >= 25 & age <= 29 ~ "25 to 29",
    age > 29 ~ "Over 30"
  )) %>% 
  mutate(age_group = factor(age_group, levels = c("Over 30", "25 to 29", "Under 25")))

# Combine total fantasy points by team
total_fpts_by_team <- fantrax_data %>%
  group_by(fantasy_team) %>%
  summarize(total_pts = sum(f_pts), .groups = 'drop') %>%
  arrange(desc(total_pts))

# Creating base plot
snbrst_0 <- ggplot(total_fpts_by_team) 

# Creating inner circle (total fantasy points in the whole league)
snbrst_1 <- snbrst_0 +
  geom_bar(data = total_fpts_by_team, aes(x = 1, y = total_pts), fill="darkgrey", stat = 'identity') +
  geom_text(aes(x=1, y=total_pts/2, label=paste('Total fantasy league points:', comma(total_pts))), color='white')

# Testing first polar coord flip 
# snbrst_1 + coord_polar('y')

# Adding stacked bar chart with each teams total fantasy points
snbrst_2 <- snbrst_1 +
  geom_bar(data = total_fpts_by_team, 
           aes(x = 2, y = total_pts),
           color = 'white', position = 'stack', stat = 'identity', size = 0.6) +
  geom_text(data = total_fpts_by_team, aes(label = paste(fantasy_team, total_pts), x = 2, y = total_pts), position = 'stack')
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
# Adding third column to break up teams into age groups
# Combine total fantasy points by age group for each team
total_fpts_by_age_team <- fantrax_data %>%
  group_by(fantasy_team, age_group) %>%
  summarize(total_pts = sum(f_pts), .groups = 'drop')

snbrst_3 <- snbrst_2 +
  geom_bar(data = total_fpts_by_age_team, 
           aes(x = 3, y = total_pts, fill = age_group), 
           color = 'white', position = 'stack', stat = 'identity', size = 0.6)

# Convert to polar coordinates to achieve the sunburst effect
snbrst_3 + coord_polar('y')

This plot is still a mess, I’ll need more time over the next week and a half to finish it up and to really understand how to code it. I think that the other two plots are pretty close to finished, but I’ll speak on this graph more later in the assignment.

#3: Scatter plot (domain expert)

Here is my mocked up version:

### Testing different player filter for the plot -- thinking top 12 will be best
# top_players_by_team <- fantrax_data %>%
#   group_by(fantasy_team) %>%
#   slice_max(order_by = fp_g, n = 12) %>%
#   ungroup()
# 
# team_averages_top12 <- top_players_by_team %>%
#   group_by(fantasy_team) %>%
#   summarise(
#     AverageAge = mean(age, na.rm = TRUE),
#     AverageFpG = mean(fp_g, na.rm = TRUE)
#   )
# 
# team_averages_top12$logo_path <- c("images/orcas.png", "images/bigfoots.png", 
#                              "images/BBB.png", "images/crusaders.png",
#                              "images/hounds.png", "images/scorpions.png",
#                              "images/milkers.png", "images/pilots.png", 
#                              "images/swell.png", "images/serpents.png", 
#                              "images/stotches.png", "images/starks.png")

# Create scatter plot with size based on draft_capital
scatter <- ggplot(team_averages, aes(x = AverageAge, y = AverageFpG)) +
  geom_image(aes(image = logo_path), size = 0.12, alpha = 0.7) +
  scale_size_continuous(range = c(3, 12)) + # Adjust size range for points
  labs(
    title = "Fantasy Scoring versus Age by Team",
    x = "Average Age",
    y = "Average Fantasy Points Per Game"
  ) +
  theme_bw() +
  theme(
    text = element_text(family = "Courier"),
    plot.background = element_rect(fill = "antiquewhite")
  ) +
  expand_limits(x = c(min(team_averages$AverageAge) - 1, max(team_averages$AverageAge) + 1),
                y = c(min(team_averages$AverageFpG) - 1, max(team_averages$AverageFpG) + 1)) # Stretch axes

# Display scatter plot
scatter

# Save scatter plot image
ggsave("images/scatter.png", scatter)
Saving 7 x 5 in image

Challenges and Future Work

Q: What challenges did you encounter or anticipate encountering as you continue to build / iterate on your visualizations in R?

My biggest issue is the sunburst plot. I need to spend more time on the code and understanding how to make it so the three age groups are directly outside of each fantasy team instead of grouped next to each other. I think I will look into other ways to make the plot in addition the polar coordinate method. I will be spending the majority of the time on the final assignment working on this, but I think my vision is clear and I just have to write the code to get it done.

The ridge plot is pretty close to finished — I’m happy with how the team logos look on the plot, but I need to spend time on choosing the right color palette to make it more accessible. I also played around with different images as the background, but I am now thinking that the court as the background is a little too busy. I want to incorporate some texture into this plot, similar to what is in the 2nd inspiration plot that I showed earlier.

I’ve spent a lot of time playing around with different player filters – selecting the top 10, 12, 15 or 20 players on each fantasy team to plot on the scatter plot. There is no clear best number that is best since each team’s roster construction and depth is different. Some teams have 10 really good players, some only have 6, some might have more. I think I will go with the top 12 teams, since 10 players are in the starting line-up each night. While this isn’t a techinical challenge, it will be important for me to keep playing around with these numbers to capture the things I want to reveal about the league in the scatter plot.

Q: What ggplot extension tools / packages do you need to use to build your visualizations? Are there any that we haven't covered in class that you'll be learning how to use for your visualizations?

The ridge plot and scatter plot are made using packages we have worked with in class, in addition to the ggimage package for adding the team logos into these two plots. The polar coordinate function is within the ggplot package, but we haven’t worked with these types of plots so it is new to me. I think I will try to make the sunburst plot with plotly. I want to spend more time with the sportyR package to see if there are little details and images I can use from the library to incorporate into the final assignment, shoutout to Sevan for pointing me to this. Overall, there isn’t anything too new or crazy in terms of libraries. I will add some annotations to the scatter plot to provide extra context (hence why it will be domain expert) about draft capital of each fantasy team.

Q: What feedback do you need from the instructional team and / or your peers to ensure that your intended message is clear?

I think I have a good sense of the things I want to improve for the final iteration of the assignment:

  • Actually getting the sunburst plot to look how I want it to

  • Adding sportyR visuals and aesthetics into the plots

  • Adding annotations to the scatter plot for more context on the league (draft capital or lack-thereof)

  • Updating the color palettes to be more inclusive of color-blind issues

I would appreciate feedback/insight on the best way to go about the sunburst plot and potential color gradients for the ridge plot.